Contextual Gaussian Process Bandit Optimization

نویسندگان

Andreas Krause

Cheng Soon Ong

چکیده

How should we design experiments to maximize performance of a complexsystem, taking into account uncontrollable environmental conditions? Howshould we select relevant documents (ads) to display, given information about theuser? These tasks can be formalized as contextual bandit problems, where at eachround, we receive context (about the experimental conditions, the query), andhave to choose an action (parameters, documents). The key challenge is to tradeoff exploration by gathering data for estimating the mean payoff function over thecontext-action space, and to exploit by choosing an action deemed optimal basedon the gathered data. We model the payoff function as a sample from a Gaussianprocess defined over the joint context-action space, and develop CGP-UCB, anintuitive upper-confidence style algorithm. We show that by mixing and matchingkernels for contexts and actions, CGP-UCB can handle a variety of practical ap-plications. We further provide generic tools for deriving regret bounds when usingsuch composite kernel functions. Lastly, we evaluate our algorithm on two casestudies, in the context of automated vaccine design and sensor management. Weshow that context-sensitive optimization outperforms no or naive use of context.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation

Multi-Armed Bandit (MAB) framework has been successfully applied in many web applications. However, many complex real-world applications that involve multiple content recommendations cannot fit into the traditional MAB setting. To address this issue, we consider an ordered combinatorial semi-bandit problem where the learner recommends S actions from a base set of K actions, and displays the res...

متن کامل

On 2-armed Gaussian Bandits and Optimization

We explore the 2-armed bandit with Gaussian payoos as a theoretical model for optimization. We formulate the problem from a Bayesian perspective, and provide the optimal strategy for both 1 and 2 pulls. We present regions of parameter space where a greedy strategy is provably optimal. We also compare the greedy and optimal strategies to a genetic-algorithm-based strategy. In doing so we correct...

متن کامل

Nonparametric Contextual Bandit Optimization via Random Approximation

We examine the stochastic contextual bandit problem in a novel continuous-action setting where the policy lies in a reproducing kernel Hilbert space (RKHS). This provides a framework to handle continuous policy and action spaces in a tractable manner while retaining polynomial regret bounds, in contrast with much prior work in the continuous setting. We extend an optimization perspective that h...

متن کامل

Learning and decisions in contextual multi-armed bandit tasks

Contextual Multi-Armed Bandit (CMAB) tasks are a novel framework to assess decision making in uncertain environments. In a CMAB task, participants are presented with multiple options (arms) which are characterized by a number of features (context) related to the reward associated with the arms. By choosing arms repeatedly and observing the reward, participants can learn about the relation betwe...

متن کامل

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Contextual Gaussian Process Bandit Optimization

نویسندگان

چکیده

منابع مشابه

Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation

On 2-armed Gaussian Bandits and Optimization

Nonparametric Contextual Bandit Optimization via Random Approximation

Learning and decisions in contextual multi-armed bandit tasks

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

عنوان ژورنال:

اشتراک گذاری